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[shorter sequence is likely to occur— just by 
; chance— a sufficient number of additional times 
[ to provide false signals. The minimum length 
{ required for unique recognition increases with 
| the size of genome.) The 12 bp sequence need 
[not be contiguous; and, in fact, if a specific 
[number of base pairs separates two constant 
[shorter sequences, their combined length could 
Ibe less than 12 bp, since the distance of 
[separation itself provides a part of the signal 
l{even if the intermediate sequence is itself 
Irrelevant). 

Attempts to identify the features in DNA that 
are necessary for RNA polymerase binding 
^started by comparing the sequences of different 
promoters. Any essential nucleotide sequence 
Should be present in all the promoters. Such a 
frequence is said to be conserved. However, 
fa conserved sequence need not necessarily be 
l&mserved at every single position; some varia- 
tion is permitted. How do we analyze a 
^sequence of DNA to determine whether it is 
^sufficiently conserved to constitute a recogniz- 
able signal? 

Putative DNA recognition sites can be defined 
|h terms of an idealized sequence that repre- 
sents the base most often present at each posi- 
tion. A consensus sequence is defined by 
Jpigning all known examples so as to maximize 
Ifteir homology. For a sequence to be accepted 
||s a consensus, each particular base must be 
|£asonably predominant at its position, and 
||pst of the actual examples must be related to 
f|e consensus by rather few substitutions, say, 
fp more than 1-2. 

|More than 100 promoters have been 
|quenced in E. coli, and a striking feature 
the lack of any extensive conservation of 
\0ucnce over the 60 bp associated with RNA 
polymerase. The sequence of much of the bind- 
site is irrelevant. But some short stretches 
JpWn the promoter are conserved, and they 
Pp. critical for its function. Conservation of only 
l-IP' short consensus sequences is a typical 
Wture of regulatory sites (such as promoters) in 
wit pr °k ar yotic and euharyotic genomes. 

sre are four conserved features in a bac- 
p al promoter: the starlpoint; the -JO sequence; 
1^35 sequence; and the distance between the 
Sl a nd -35 sequences: 
#? : 



♦ The startpoint is usually (>90% of the time) 
a purine. It is common for the startpoint to 
be the central base in the sequence CAT, but 
the conservation of this triplet is not great 
enough to regard it as an obligatory signal. 

♦ Just upstream of the startpoint, a 6 bp region 
is recognizable in almost all promoters. The 
center nf the hexamer generally is close to 
10 bp upstream of the startpoint; the distance 
varies in known promoters from position -18 
to -9. Named for its location, the hexamer is 
often called the -10 sequence. Its consensus 
is TATAAT, and can be summarized in the 
form 

T«o \s T45 A&i T 96 

where the subscript denotes the percent 
occurrence of the most frequently found base, 
varying from 45-96%. (A position at which 
there is no discernible preference for any 
base would be indicated by N.) If the fre- 
quency of occurrence indicates likely impor- 
tance in binding RNA polymerase, we would 
expect the initial highly conserved TA and 
the final almost completely conserved T in 
the -10 sequence to be the most important 
bases. 

♦ Another conserved hexamer is centered -35 
bp upstream of the startpoint. This is called 
the -35 sequence. The consensus is TTGACA; 
in more detailed form, the conservation is 

^82 T w G 78 A, iS C m A^ 

♦ The distance separating the -35 and -10 sites 
is between 16 and 18 bp in 90% of promoters; 
in the exceptions, it is as little as 15 or as 
great as 20 bp. Although the actual sequence 
in the intervening region is unimportant, the 
distance is critical in holding the two sites 
at the appropriate separation for the geometry 
of UNA polymerase. 

From data collected on many promoters, we 
can define the optimal promoter as a sequence 
consisting of the -35 hexamer, separated by 17 
bp from the -10 bexamer, lying 7 bp upstream 
of the startpoint. The structure of a promoter, 
showing the permitted range of variation from 
this optimum, is illustrated in Figure 11.16. 



